Term Frequency Based Cosine Similarity Measure for Clustering Categorical Data using Hierarchical Algorithm
نویسندگان
چکیده
منابع مشابه
Occurrence Based Categorical Data Clustering Using Cosine and Binary Matching Similarity Measure
Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about occurrence based categorical data clustering (OBCDC) technique based on cosine similarity measure and simple...
متن کاملSurvey on Clustering Algorithm and Similarity Measure for Categorical Data
Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification) or unsupervised learning (e.g. Clustering) Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as n...
متن کاملA Rough Set-Based Hierarchical Clustering Algorithm for Categorical Data
In this paper, rough set theory is applied to the clustering analysis. The clustering decision table is formed through the introduction of decision attribute into data table, thereby further defining the attribute membership matrix. The consistent degree and aggregate degree are present, and their functions in the clustering process are deeply analyzed. The clustering level calculation formula ...
متن کاملIncremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure
Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose...
متن کاملHolo-Entropy Based Categorical Data Hierarchical Clustering
Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerica...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Research Journal of Applied Sciences, Engineering and Technology
سال: 2015
ISSN: 2040-7459,2040-7467
DOI: 10.19026/rjaset.11.2043